A start at opening the door to support for other languages. #103

paul90 · 2015-03-07T19:54:17Z

Just a start for review

a link to Åsnes generates a request URL: http://localhost:3000/snes.json?random=14cc3a4d&title=%C3%85snes

title provides something that a server might want to work with to find the page's data.

WardCunningham · 2015-03-08T04:35:28Z

It would seem that we no longer need the link attribute, data-page-name="#{slug}", applied in lib/resolve.coffee. I'm not sure how to test that this is true.

paul90 · 2015-03-08T05:39:47Z

Maybe a different tack...

When we are fetching a page we already have a title to slug lookup table in the sitemap. So, the only real issues are with page creation, and ensuring that the sitemap is up to date. As we only create pages either on the origin server, or in local storage, there is little reason why this change could not be a breaking one - and introduce a level of cooperation between the client and server when a page is initially created, or forked.

WardCunningham · 2015-03-08T20:43:46Z

Good point. When a sitemap is present and the link text matches a sitemap entry then this mapping should override that of asStub. We should define text matching to be a case independent match when we know enough about the character set to apply this transformation.

We now have at least two ways to "paint our way" out of this corner. I don't see any problem with applying both when possible.

egonelbre · 2015-03-25T15:04:46Z

I dislike the necessity of having the sitemap. Currently one of my sitemaps is 1MB, which isn't huge, but certainly an inconvenience on mobile devices. E.g. what if you wanted to use wikipedia.org as a part of federated wiki.

Maybe we could let the servers handle resolving the names instead? For example when you request for "/世界" it would use an HTTP redirect you to "/world", also the normalized slug / url would always be present in the page itself, that way when a new link is created it would always refer to correct slug/title.

WardCunningham · 2015-03-25T15:24:53Z

Maybe you could try the paul90/utf-8-pagename branch and tell us how it works for you?

I would be interested in how interoperable you could make your server. When provided a title you would have all the information that the client has. You would be responsible for case-insensitive matching for those alphabets that have case. You would also be expected to match conventional slugs when handling CORS requests from clients that don't have this modification.

I still think the sitemap approach has merit and would allow a server to number pages as some wiki implementations want to do. If you choose to not offer a sitemap then this indirection will not be available to you. I haven't thought through this case carefully yet. It seems that employing an alternate slug algorithm has similar interoperability considerations.

egonelbre · 2015-03-25T15:42:11Z

I took a look at the changes, and it doesn't seem to have the history copying - but of course it could be added. I simply cannot see whether that approach would look good on the titlebar. Also I'm using a different client.

The handling on the server side is would be pretty trivial - i.e.

check whether you have the exact page needed
check whether you have the page with your own slugification approach (i.e. normalize, remove spaces ... whatever you consider equivalent)
check whether any of the pages has the old-format-slug

WardCunningham · 2015-03-25T15:50:57Z

How different is your client?

egonelbre · 2015-03-25T16:05:49Z

The current progress is here: https://github.com/raintreeinc/kbclient. Not complete and I haven't yet had time to completely fix some of the issues. I'm currently still migrating the first prototype to the new infrastructure. Essentially the client is a completely different codebase, but behavior is similar.

Also, it has different goals than fedwiki, so it doesn't have (exactly) the same feature-set. Of course I try to keep it inter-operable with fedwiki protocol. The gist of the KnowledgeBase is "federated wiki between different people groups". Essentially this is an effort to merge usual help pages with engineering information and user provided wiki information.

I liked the navigation style of Federated Wiki client, so I started from there. It provides a very fast way of navigation complex information. The Federation part gives really good boundaries for managing different wiki-s and their security, but at the same time allow one way access.

WardCunningham · 2015-03-25T16:09:41Z

Another approach might be for the client to try a unicode-friendly slug first and if that doesn't work, try the original slug when different. Can you provide us a table of sample titles and how they would convert to the two slug formats in question?

egonelbre · 2015-03-25T16:20:47Z

These should give the general idea https://github.com/egonelbre/fedwiki/blob/master/slug_test.go#L9. kbclient contains the slugification code for javascript as well.

egonelbre · 2015-03-26T07:33:16Z

I'm worried with my suggested slugification as well... Mainly, maybe there is some letters that aren't present in the current unicode tables, which means that in future you need to start upgrading those tables for clients, otherwise it won't work for all cases. With Server based approach you don't have that problem, because the server knows what pages it has and can appropriately discard those pages.

Of course there is a problem with the Server approach - if you update your naming/slugification scheme links to your site will break.

WardCunningham · 2015-03-26T14:58:59Z

@egonelbre I thought your slugs looked nice and handled some edge cases like duplicate or leading/trailing dashes that were meant for the original but didn't get implemented in the rush to get the project started.

I notice that you spell out some symbols. Is that to avoid specific meaning in a url while avoiding the %xx notation?

I also notice that you pass the slash (/) with some reference to hierarchical names. Are they a requirement of your application? Or is this something you like from other wiki? I have avoided namespace concepts hoping that federation would be sufficient and more "natural".

egonelbre · 2015-03-26T15:31:35Z

The reason I am replacing symbols is to avoid problems with URLs. Essentially I had titles such as * operator, @ operator, so simply removing them wasn't an option. I don't like replacing them with %xx also, because they won't look nice.

I initially was keeping the slash because I had multiple federation end-points under a single address. E.g. /help/, /wiki/, /dev/ of course now I'm changing it to properly distributed. The main use-case I can see for paths is generated services and converted pages:

you are exporting a directory as a federated server
you take an existing html page structure and convert to federated wiki

One case I have is that we have multiple application version and each one has separate version of help pages - so it would be nice to serve them from under help.raintreeinc.com/500, help.raintreeinc.com/400... instead of help500.raintreeinc.com (or something similar)... that approach also reduces the DNS maintanance overhead.

Essentially regarding /, I haven't yet decided either way - it is useful for specific cases, but when you are creating a personal wiki it isn't necessary.

egonelbre · 2015-03-26T15:36:29Z

Forgot one of my use cases... providing a citations listing, e.g. /citations/xyz is a page that links to all pages that cite or reference page /xyz. I can't do that on the client side because the whole site content is +100MB. Similarly /tags/xyz, for all pages that have a tag xyz.

paul90 · 2015-03-26T18:12:55Z

Somewhat off topic, regarding slugs and internationalization, but related

Elsewhere, WardCunningham/Smallest-Federated-Wiki#412, there are some initial thoughts on changing the story serialization. This is somewhat connected insofar as it might open a few possibilities. While I'm not sure that I would go quite the same route having thought about this for nearly a year I would probably use something more like http://fed.wiki.org/#welcome-visitors&[email protected]. Which is what I've used with the narrative view.

Not quite sure how services like citations and tags would mix with the current URL structure - while they could replace view on the origin server, wanting to request them from others servers is more tricky, as we then have either 2 or 3 parts for each page in the url. Add in hierarchy support and there could not only be more, but some ambiguity if we have a hierarchy or are calling a service to create the page.

I'm really not sure if I've seen a slugify routine that I am really happy with. They all seem to have some shortcomings, the best remove accents rather than dropping letters as well as replacing ligatures.

I wonder if it would not be better to simply URL encode the page title - most modern browsers will do their thing and make it readable. It is then up to the server how it maps this onto the name it will use for storage. The big downside is that the slug is used to refer to the page in many places, so this would be a big change.

egonelbre · 2015-03-26T18:32:01Z

The citations and tags wouldn't really be separate services they are simply regular pages just with a specific structure. Other clients don't have to know that there even are special pages - if they know that they exist, they can take advantage of them. Also a correction, the full path I'm using is /index/citations/xyz and /index/tags/xyz.

I'm currently using http://localhost/#┃/home┃/1-up-labels┃//fed.wiki.org/welcome-visitors for the lineup (copy paste to location-bar, it looks better in there). Removing accents doesn't really work for 世界. And yes, I have to agree with slugification on the client side, it's just hard to make sure it's future proof and later changing it would cause even more problems.

Although I also tried the @ approach, but it seemed that //server/page looks more readable. Although using an @ would enable to use [email protected]/search... Although specifying links to them would be more complicated... also it seems like reinventing URIs.

I'm not exactly sure what do you mean by that downside of "slug is used to refer to the page in many places". Could you elaborate?

paul90 · 2015-03-26T20:16:08Z

I'm not exactly sure what do you mean by that downside of "slug is used to refer to the page in many places". Could you elaborate?

Just musing that if we moved to using url encoding for making server requests, and doing the slug generation on the server, that there are knock-on effects in the client.

The more I look at this, the more I get the feeling that all the slugify routines are a relic from the past, before any internationalization of the web. The different national versions of wikipedia look to use a mix of url encoding and a minimal set of character replacement (replacing with_being the obvious one). So, we they have urls likehttps://el.wikipedia.org/wiki/%CE%91%CE%BD%CE%B4%CF%81%CE%AD%CE%B1%CF%82_%CE%9C%CE%B9%CE%B1%CE%BF%CF%8D%CE%BB%CE%B7%CF%82which as rendered in the address bar ashttps://el.wikipedia.org/wiki/Ανδρέας_Μιαούλης, and of course I could just type the Greek in the address bar. The server would have to do some work to translate the url encoding into a storage location.

paul90 · 2016-07-11T16:04:55Z

I'm closing this, as there has been no activity in over a year.

replaid · 2022-10-20T15:08:32Z

I am creating a wiki farm for an international project related to permaculture. The ability to title pages with non-Latin characters is an absolute must. Absolutely everything else about Federated Wiki is squarely aligned with this project's needs.

Naively, just jumping into this thread 6 years after it was closed, I like the approach of imitating Wikipedia's URL-encoding-based solution on the front end and letting the back end worry about how to persist it. Punycode would be another possible encoding that might offer a path for compatibility.

I am motivated to fix this for my community and would really love to do so in a way that fixes it for the world and fits with this project.

almereyda · 2022-10-21T13:32:50Z

Hi @replaid !

Glad you are joining the conversation here. To ease the conversation and development process, though, I would suggest to open up a new issue, preferably in the overarching project https://github.com/fedwiki/wiki/issues (as GitHub discussions are disabled).

I am myself responsible for taking care of the legacy from Silke Helfrich and her fabulous work with David Bollier on the pattern language of commoning, published in Federated Wikis at:

There are a few more language editions coming up next (French, Spanish, Greek, and what have you)

http://editions.commoning.wiki/

We are entering uncharted land here, so it will be on us to provide the Wiki community not only with requirements, but also with good practices, possible workarounds and eventually original development. The subjects of translation, and more broadly, internationalisation (i18n) will bring much joy and nuts to crack, so we ought to continue to work slowly, but surely on this.

I went ahead and filed a new issue for this subject:

Internationalisation (i18n) wiki#139

Please feel free to continue our conversation there.

WardCunningham · 2022-10-22T17:50:39Z

@replaid can you give us some examples of "pages titled with non-Latin characters an absolute must." Are you expecting collaboration to work across languages? How? For the sake of this conversation, perhaps you can give us English translations too.

replaid · 2022-10-22T20:52:21Z

The author below is compiling permaculture articles in Russian and we anticipate collaboration between speakers of Russian, English, and Spanish in the near term, possibly other languages as well (the next one on the list would be right-to-left…). This would be titled "Guilds" in English.

Here it is with an English-language translation too:

http://john.permakultura.wiki/view/welcome-visitors/miron.permakultura.wiki/gildii/view/guilds

The limitation of Latin letters for the page title is the only roadblock I am aware of to having a normal wiki workflow and interaction. But we anticipate a fairly broad-based selection, i.e. not all academics—I believe it will be pretty important to be able to support a fully native-language workflow.

replaid · 2022-10-22T21:16:37Z

I would want to name this page in its own script and cite it the same way: [[Гильдии]]. Writers in other languages are used to flipping their keyboard back and forth to type web addresses, etc., so this won't represent much of a problem. It would be nice to be able to make links using keys that are available on the native language (Russian has more letters, so curly braces and square brackets aren't easily available), but this is a far lower priority than being able to use non-Latin letters in page titles.

Russian is a phonetic alphabet, and can be very effectively transliterated into Latin letters (in this case Gil'dii)—this happens in the slugs on Russian news sites—but I don't know if it's best to pursue that path across languages.

I have been looking at Punycode, Nameprep, and stringprep and I think it has potential—the main thing I'm aware of that I need to think through is how it would interact with wiki when searching for pages with the use of substrings. Otherwise I think it's very close to our needs with a similar history of starting with a Latin-only system and extending into Unicode with security concerns, etc.

WardCunningham · 2022-10-22T23:12:16Z

Would an author want to identify Gil'dii as a synonym for Гильдии?

paul90 · 2022-10-23T07:19:53Z

Internationalisation is not just an issue for the client. To ensure that all aspect are covered it will be best to discuss this subject over in fedwiki/wiki#139 , rather than in this closed PR that was just a start at exploring one small corner of this issue.

replaid · 2022-10-23T12:49:19Z

Would an author want to identify Gil'dii as a synonym for Гильдии?

In terms of its value to the user experience, this would be like asking an English speaker to please type Уики Уики Уеб somewhere as a synonym for Wiki Wiki Web because the author of the software was Russian.

I've created an issue specifically for page titles at fedwiki/wiki#140. I'm optimistic that this can be done in a compatible way without a huge lift.

A start at opening the door to support for other languages.

bea9292

WardCunningham mentioned this pull request Mar 25, 2015

Unicode titles #107

Closed

WardCunningham mentioned this pull request Jul 23, 2015

Pages with Unicode Titles Don't Work fedwiki/wiki#54

Closed

paul90 closed this Jul 11, 2016

replaid mentioned this pull request Oct 23, 2022

Internationalization of page titles fedwiki/wiki#140

Open

replaid mentioned this pull request Oct 23, 2022

Internationalisation (i18n) fedwiki/wiki#139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A start at opening the door to support for other languages. #103

A start at opening the door to support for other languages. #103

paul90 commented Mar 7, 2015

WardCunningham commented Mar 8, 2015

paul90 commented Mar 8, 2015

WardCunningham commented Mar 8, 2015

egonelbre commented Mar 25, 2015

WardCunningham commented Mar 25, 2015

egonelbre commented Mar 25, 2015

WardCunningham commented Mar 25, 2015

egonelbre commented Mar 25, 2015

WardCunningham commented Mar 25, 2015

egonelbre commented Mar 25, 2015

egonelbre commented Mar 26, 2015

WardCunningham commented Mar 26, 2015

egonelbre commented Mar 26, 2015

egonelbre commented Mar 26, 2015

paul90 commented Mar 26, 2015

egonelbre commented Mar 26, 2015

paul90 commented Mar 26, 2015

paul90 commented Jul 11, 2016

replaid commented Oct 20, 2022

almereyda commented Oct 21, 2022

WardCunningham commented Oct 22, 2022 •

edited

Loading

replaid commented Oct 22, 2022 •

edited

Loading

replaid commented Oct 22, 2022

WardCunningham commented Oct 22, 2022

paul90 commented Oct 23, 2022

replaid commented Oct 23, 2022

A start at opening the door to support for other languages. #103

A start at opening the door to support for other languages. #103

Conversation

paul90 commented Mar 7, 2015

WardCunningham commented Mar 8, 2015

paul90 commented Mar 8, 2015

WardCunningham commented Mar 8, 2015

egonelbre commented Mar 25, 2015

WardCunningham commented Mar 25, 2015

egonelbre commented Mar 25, 2015

WardCunningham commented Mar 25, 2015

egonelbre commented Mar 25, 2015

WardCunningham commented Mar 25, 2015

egonelbre commented Mar 25, 2015

egonelbre commented Mar 26, 2015

WardCunningham commented Mar 26, 2015

egonelbre commented Mar 26, 2015

egonelbre commented Mar 26, 2015

paul90 commented Mar 26, 2015

egonelbre commented Mar 26, 2015

paul90 commented Mar 26, 2015

paul90 commented Jul 11, 2016

replaid commented Oct 20, 2022

almereyda commented Oct 21, 2022

WardCunningham commented Oct 22, 2022 • edited Loading

replaid commented Oct 22, 2022 • edited Loading

replaid commented Oct 22, 2022

WardCunningham commented Oct 22, 2022

paul90 commented Oct 23, 2022

replaid commented Oct 23, 2022

WardCunningham commented Oct 22, 2022 •

edited

Loading

replaid commented Oct 22, 2022 •

edited

Loading